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(57) Abstract 

A closed-loop, multimode, predictive speech coder 
includes a codec (100, 200) configured to operate in any 
of several coding modes, and a closed-loop mode decision 
module configured to apply a lowest-bit-rate coding mode 
to an input speech frame. A performance measure of the 
codec is obtained and compared with a threshold value. If 
the performance measure does not exceed the threshold value, 
the lowest-bit-rate coding mode is rejected in favor of a 
coding mode with a higher bit rate. The process can be 
continued until the coding performance is satisfactory. A 
higher-bit-rate, direct coding mode may be applied after a 
lower-bit-rate, prediction-based coding mode has failed to 
perform satisfactorily. 
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CLOSED-LOOP VARIABLE-RATE MULTIMODE PREDICTIVE 

SPEECH CODER 



BACKGROUND OF THE INVENTION 



I. Field of the Invention 

10 The present invention pertains generally to the field of speech processing, 

and more specifically to closed-loop, variable-rate, multimode, predictive coding of 
speech. 

II. Background of Invention 

15 

Transmission of voice by digital techniques has become widespread, 
particularly in long distance and digital radio telephone applications. This, in turn, 
has created interest in determining the least amount of information that can be sent 
over a channel while maintaining the perceived quality of the reconstructed 

20 speech. If speech is transmitted by simply sampling and digitizing, a data rate on 
the order of sixty-four kilobits per second (kbps) is required to achieve a speech 
quality of conventional analog telephone. However, through the use of speech 
analysis, followed by the appropriate coding, transmission, and resynthesis at the 
receiver, a significant reduction in the data rate can be achieved. 

25 Devices that employ techniques to compress speech by extracting 

parameters that relate to a model of human speech generation are called speech 
coders. A speech coder divides the incoming speech signal into blocks of time, or 
analysis frames. Speech coders typically comprise an encoder and a decoder, or a 
codec. The encoder analyzes the incoming speech frame to extract certain relevant 

30 parameters, and then quantizes the parameters into binary representation, i.e., to a 
set of bits or a binary data packet. The data packets are transmitted over the 
communication channel to a receiver and a decoder. The decoder processes the 
data packets, unquantizes them to produce the parameters, and then resynthesizes 
the speech frames vising the unquantized parameters. 

35 The function of the speech coder is to compress the digitized speech signal 

into a low-bit-rate signal by removing all of the natural redundancies inherent in 
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speech. The digital compression is achieved by representing the input speech, 
frame with a set of parameters and employing quantization to represent the 
parameters with a set of bits. If the input speech frame has a number of bits Nj and, 
the data packet produced by the speech coder has a number of bits N 0 , the 

^A»v»^v/>r>fM/\v» £-» /*4-^n— J l-.-. it. ... ~— . « ~ J • „ /— » K T /XT Tl. _ _1. _ 11 _ .. . * _ # . 
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retain high voice quality of the decoded speech while achieving the target 
compression factor. The performance of a speech coder depends on (1) how well 
the speech model, or the combination of the analysis and synthesis process 
described above, performs, and (2) how well the parameter quantization process is 
performed at the target bit rate of N 0 bits per frame. The goal of the speech model 
is thus to capture the essence of the speech signal, or the target voice quality, with 
a small set of parameters for each frame. 

One effective technique to encode speech efficiently at low bit rate is 
multimode coding. A multimode coder applies different modes, or encoding- 
decoding algorithms, to different types of input speech frames. Each mode, or 
encoding-decoding process, is customized to represent a certain type of speech 
segment (i.e., voiced, unvoiced, or background noise) in the most efficient manner. 
An external mode decision mechanism examines the input speech frame and make 
a decision regarding which mode to apply to the frame. Typically, the mode 
decision is done in an open-loop fashion by extracting a number of parameters out 
of the input frame and evaluating them to make a decision as to which mode to 
apply. Thus, the mode decision is made without knowing in advance the exact 
condition of the output speech, i.e., how similar the output speech will be to the 
input speech in terms of voice-quality or any other performance measure. An 
exemplary open-loop mode decision for a speech codec is described in U.S. Patent 
No. 5,414,796, which is assigned to the assignee of the present invention and fully 
incorporated herein by reference. 

Multimode coding can be fixed-rate, using the same number of bits N 0 for 
each frame, or variable-rate, in which different bit rates are used for different 
modes. The goal in variable-rate coding is to use only the amount of bits needed to 
encode the codec parameters to a level adequate to obtain the target quality. As a 
result, the same target voice quality as that of a fixed-rate, higher-rate coder can be 
obtained at a significant lower average-rate using variable-bit-rate (VBR) 
techniques. Conventional VBR speech coders are designed with modes having 
different bit-rates. An exemplary variable rate speech coder is described in U.S. 
Patent No. 5,414,796, assigned to the assignee of the present invention and 
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previously fully incorporated herein by reference. The codec described in the 
aforesaid patent has the following four rates: (1) full rate (FR); (2) half rate (HR); 
(3) quarter rate (QR); and (4) eighth rate (ER). For the foregoing rates, each frame 
of speech is encoded by 160, eighty, forty, and twenty bits per frame, respectively. 
An external open-loop mode decision is made regarding which mode (FR, HR, QR 
or ER) to apply to the input speech frame. 

There is presently a surge of research interest and strong commercial needs 
to develop a high-quality speech coder operating at medium to low bit rates (i.e., in 
the range of 2.4 to 4 kbps and below). The application areas include wireless 
telephony, satellite communications, Internet telephony, various multimedia and 
voice-streaming applications, voice mail, and other voice storage systems. The 
driving forces are the need for high capacity and the demand for robust 
performance under packet loss situations. Various recent speech coding 
standardization efforts are another direct driving force propelling research and 
development of low-rate speech coding algorithms. A low-rate speech coder 
creates more channels, or users, per allowable application bandwidth, and a low- 
rate speech coder coupled with an additional layer of suitable channel coding can 
fit the overall bit-budget of coder specifications and deliver a robust performance 
under channel error conditions. 

Conventional speech coders typically use some form of prediction 
mechanism to encode the current frame. Thus, to encode the current frame, a 
speech coder exploits and uses the information contained in the last decoded and 
recreated frame. This works well because there is typically strong correlation, or 
similarity, between successive frames. Thus, a frame or short segment of speech, 
S d.r(n), where n=l,2,.,.,N, having N samples can be encoded by a predictive 
method to form the encoded frame , S curquanlizcd (n), according to the following 
equation: 

Scur.quantizedO'O ^cur_predictcd(^) E cur _q UantizecJ (n) 

~~ ^prrv_quantized(^) P(*"0 E Cur q uant j 2CC j(n), 

where "*" represents a convolution operation, P(n) is a conventional prediction 
filter that produces an approximation of current frame from past quantized frame, 

S r rcv.quantiy.ed(n) * P(n)), and E cur quanti7Cl1 (n) is the 
quantized version of the prediction error E cur (n) of the current frame. The 
prediction error is defined as E cur (n) = S cur (n) - S cur prcdicUxj (n). 
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The performance of the prediction scheme is often measured by a signal-to- 
noise ratio (SNR) or a perceptual SNR (PSNR), typically defined as: 

N 

XW(n)*Scur(n)*Sc«r(n) 

PSNR = 101og,o(^ ) 

XW(n)*Ncur(n)*Ncur(n)n 



where W(n), for n=l,2,...,N, is a perceptual weight factor and N^n) is the error of 
the overall coding process. The error of the overall coding process is defined as 
N cur (n) = S cur (n) - S cur quantized (n). For ordinary SNR, W(n) is set equal to 1 for all 
n=l,2,...,N. 

If the error N cur decreases, the performance of the prediction-based speech 
coding scheme, or the SNR, will increase. It is therefore advantageous to minimze 
the error N^. The equation 

N cur (n) = S cur (n) _ S cur _q uantizcd (n) = [S cur (n) - S cur _p ret j ictod (n)] + [E cur quantized (n) 

E cur (n)] 

= Prediction-Error + Error in-the-Quantization-of-Prediction-Error-Signal 

indicates that the overall error N cur depends on how well the prediction is 
performed, and how well the prediction error is quantized. 

The prediction filter information is necessarily sent to the decoder as a 
certain number of bits, Np. The remaining available bits, No - Np, can be used to 
encode the prediction error signal E cur . If the prediction from the quantized past 
frame, S prev quantized , generates an excellent predicted representation S^ predicted of the 
current frame S cur , the prediction error E cur will be small, having a low dynamic 
range. Hence, it will be relatively easy to encode the prediction error E cur with a 
small number of bits. 

For high-bit-rate predictive speech coders such as, e.g., the QCELP*' 13k 
vocoder manufactured by QUALCOMM INCORPORATED, the total number of 
bits per frame, No, is high. The QCELP*, for example, supports 260 bits per 20-ms 
frame. Therefore, even after allocating a number of bits, Np, to quantize the 
prediction filter parameter, there are enough remaining bits, No-Np, to accurately 
encode the prediction error. However, at low bit rates (e.g., 4 kbps and below), the 
total amount of bits available (i.e., eighty or less per frame) is not large enough to 
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accurately encode both the prediction filter parameters and the prediction error 
signal. Consequently, the overall coding error N cur grows large, resulting in poor 
performance and producing a quantized version S cur _ quanli2ed of the current frame 
that could be quite different from the original frame S cur . As the encoding of the 
next frame depends upon how well the current frame is encoded, the poor 
performance can degrade the performance of prediction of future frames as well. 
Thus, there is a need for a variable-rate, multimode, predictive coder that is 
capable of producing high-voice-quality at low bit rates. 

SUMMARY OF THE INVENTION 

The present invention is directed to a variable- rate, multimode, predictive 
coder that is capable of producing high-voice-quality at low bit rates. Accordingly, 
in one aspect of the invention, a speech coder advantageously includes a codec 
configured to operate in at least one of a plurality of coding modes; and a closed- 
loop mode decision module coupled to the codec and configured to apply a first 
coding mode from the plurality of coding modes to an input speech frame, the first 
coding mode having a first bit rate that is lower than the bit rate of any other 
coding mode of the plurality of coding modes, the closed-loop mode decision 
module being further configured to obtain a performance measure of the codec, 
compare the performance measure with a threshold value, and, if the performance 
measure does not exceed the threshold value, reject the first coding mode in favor 
of a second coding mode having a second bit rate that is greater than the first bit 
rate. 

In another aspect of the invention, a method of coding speech frames 
advantageously includes the steps of selecting a first coding mode to apply to a 
speech frame, the first coding mode having a first bit rate; obtaining a coding 
performance measure; comparing the coding performance measure with a 
threshold value; and rejecting the first coding mode in favor of a second coding 
mode if the coding performance measure does not exceed the threshold value, the 
second coding mode having a second bit rate that exceeds the first bit rate. 

In another aspect of the invention, a speech coder advantageously includes 
means for selecting a first coding mode to apply to a speech frame, the first coding 
mode having a first bit rate; means for obtaining a coding performance measure; 
means for comparing the coding performance measure with a threshold value; and 
means for rejecting the first coding mode in favor of a second coding mode if the 
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coding performance measure does not exceed the threshold value, the second 
coding mode having a second bit rate that exceeds the first bit rate. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of a communication channel terminated at each 
end by speech coders. 

FIG. 2 is a block diagram of an encoder. 

FIG. 3 is a block diagram of a decoder. 
10 FIG. 4 is a flow chart illustrating the steps of a closed-loop, multimode, 

predictive coding technique for speech frames at low bit rates. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

15 In FIG. 1 a first encoder 10 receives digitized speech samples s(n) and 

encodes the samples s(n) for transmission on a transmission medium 12, or 
communication channel 12, to a first decoder 14. The decoder 14 decodes the 
encoded speech samples and synthesizes an output speech signal s SYNT H(n). For 
transmission in the opposite direction, a second encoder 16 encodes digitized 

20 speech samples s(n), which are transmitted on a communication channel 18. A 
second decoder 20 receives and decodes the encoded speech samples, generating a 
synthesized output speech signal SsynthC 11 )- 

The speech samples s(n) represent speech signals that have been digitized 
and quantized in accordance with any of various methods known in the art 

25 including, e.g., pulse code modulation (PCM), companded ji-law, or A-law. As 
known in the art, the speech samples s(n) are organized into frames of input data 
wherein each frame comprises a predetermined number of digitized speech 
samples s(n). In an exemplary embodiment, a sampling rate of 8 kHz is employed, 
with each 20 ms frame comprising 160 samples. In the embodiments described 

30 below, the rate of data transmission may advantageously be varied on a frame-to- 
frame basis from 8 kbps (full rate) to 4 kbps (half rate) to 2 kbps (quarter rate) to 1 
kbps (eighth rate). Varying the data transmission rate is advantageous because 
lower bit rates may be selectively employed for frames containing relatively less 
speech information. As understood by those skilled in the art, other sampling 

35 rates, frame sizes, and data transmission rates may be used. 



WO 00/30075 PCI7US99/26850 

7 

The first encoder 10 and the second decoder 20 together comprise a first 
speech coder, or speech codec. Similarly, the second encoder 16 and the first 
decoder 14 together comprise a second speech coder. It is understood by those of 
skill in the art that speech coders may be implemented with a digital signal 
5 processor (DSP), an application-specific integrated circuit (ASIC), discrete gate 
logic, firmware, or any conventional programmable software module and a 
microprocessor. The software module could reside in RAM memory, flash 
memory, registers, or any other form of writable storage medium known in the art. 
Alternatively, any conventional processor, controller, or state machine could be 
10 substituted for the microprocessor. Exemplary ASICs designed specifically for 
speech coding are described in U.S. Patent No. 5,727,123, assigned to the assignee 
of the present invention and fully incorporated herein by reference, and U.S. 
Application Serial No. 08/197,417, entitled VOCODER ASIC, filed February 16, 
1994, assigned to the assignee of the present invention, and fully incorporated 
15 herein by reference. 

In FIG. 2 an encoder 100 that may be used in a speech coder includes a 
mode decision module 102, a pitch estimation module 104, an LP analysis module 
106, an LP analysis filter 108, an LP quantization module 110, and a residue 
quantization module 112. Input speech frames s(n) are provided to the mode 
20 decision module 102, the pitch estimation module 104, the LP analysis module 106, 
and the LP analysis- filter 108. The mode decision module 102 produces a mode 
index I M and a mode M based upon the periodicity of each input speech frame s(n). 
Various methods of classifying speech frames according to periodicity are 
described in U.S. Application Serial No. 08/815,354, entitled METHOD AND 
25 APPARATUS FOR PERFORMING REDUCED RATE VARIABLE RATE 
VOCODING, filed March 11, 1997, assigned to the assignee of the present 
invention, and fully incorporated herein by reference. Such methods are also 
incorporated into the Telecommunication Industry Association Industry Interim 
Standards TIA/EIA 1S-127 and TIA/EIA IS-733. 
30 The pitch estimation module 104 produces a pitch index I,> and a lag value 

P 0 based upon each input speech frame s(n). The LP analysis module 106 performs 
linear predictive analysis on each input speech frame s(n) to generate an LP 
parameter a. The LP parameter a is provided to the LP quantization module 110. 
Tine LP quantization module 110 also receives the mode M. The LP quantization 
35 module 110 produces an LP index I u > and a quantized LP parameters) . The LP 
analysis filter 108 receives the quantized LP parameters in addition to the input 
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speech frame s(n). The LP analysis filter 108 generates an LP residue signal R[n],. 
which represents the error between the input speech frames s(n) and the quantized 
linear predicted parameters fu The LP residue R[n], the mode M, and the,, 
quantized LP parameter a are provided to the residue quantization module 112. 
Based upon these values, the residue quantization module 112 produces a residue 
index I R anH a quantized residue signal — . 

In FIG. 3 a decoder 200 that may be used in a speech coder includes an LP 
parameter decoding module 202, a residue decoding module 204, a mode decoding 
module 206, and an LP synthesis filter 208. The mode decoding module 206 
receives and decodes a mode index I M , generating therefrom a mode M. The LP 
parameter decoding module 202 receives the mode M and an LP index I LP . The LP 
parameter decoding module 202 decodes the received values to produce a 
quantized LP parameter a . The residue decoding module 204 receives a residue 
index I K , a pitch index I r >, and the mode index I M . The residue decoding module 204 
decodes the received values to generate a quantized residue signal . The 
quantized residue signal R[n] and the quantized LP parameter a are provided to 
the LP synthesis filter 208, which synthesizes a decoded output speech signal s[n] 
therefrom. 

Operation and implementation of the various modules of the encoder 100 of 
FIG. 2 and the decoder of FIG. 3 are known in the art, and are described in detail in 
L.B. Rabiner & R.W. Schafer Digital Processing of Speech Signals 396-453 (1978), 
which is fully incorporated herein by reference. An exemplary encoder and an 
exemplary decoder are described in U.S. Patent No. 5,414,796, previously fully 
incorporated herein by reference. 

In one embodiment a multimode coder first uses an open-loop decision 
mode, relying on parameters extracted out of the current frame to classify the 
current frame as background-noise/silence (N), unvoiced speech (UV), or voiced 
speech (V). Various speech classification methods used for rate determination are 
known in the art, including methods described in the aforementioned U.S. Patent 
No. 5,414,796, previously fully incorporated herein by reference. N-type frames 
are coded with an eighth-rate mode, and UV-type frames are coded with a quarter- 
rate mode. 

For V-type frames (i.e., voiced speech frames), either a higher-rate (No=Nl 
bits per frame) mode such as full rate, or a lower-rate (No=N2 bits per frame, 
where N2 < Nl) mode such as half rate, is used. The full-rate mode may 
advantageously be a prediction-based coding scheme with adequate bits to 
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accurately encode various types of voiced speech, delivering a perceptual signal- 
to-noise ratio (PSNR) well above the target PSNR (a predefined or variable 
threshold value). The half-rate mode is advantageously a prediction-based coding 
scheme designed to encode frames a high degree of correlation with the previous 
frame (i.e., frames that are quite similar to the previous frame). Thus, the number 
of bits available in the half-rate mode, N2 bits per frame, is adequate to encode the 
prediction parameters for frames with high correlation, as well as the prediction 
error, which is relatively small due to the high correlation between successive 
frames. Such frames are typically encountered in steady voiced speech segments, 
which are therefore amenable to half-rate coding. Additionally, the performance 
of prediction-based coding schemes also depends on how accurately the previous 
frame is quantized. Hence, a closed-loop mode selection process is employed after 
the open-loop mode to ensure that the coding performance exceeds the predefined 
(or variable) target PSNR value. As those of skill in the art would understand, the 
open-loop mode need not necessarily be applied at all. 

The flow chart of FIG. 4 illustrates a closed-loop, multimode, predictive 
coding technique for speech frames at low bit rates, in accordance with one 
embodiment. In step 300 a frame number counter is set equal to 1. The algorithm 
then proceeds to step 302, starting the coding process. The algorithm then 
proceeds to step 304. In step 304 the algorithm checks the current frame and the 
previous quantized frame. The algorithm then proceeds to step 306. In step 306 
the algorithm determines whether the current frame should be classified as silence 
or background noise. This determination is made in accordance with various 
conventional techniques for measuring frame energy, such as, e.g., calculating the 
sum-of-squares. If the frame is classified as silence or background noise, the 
algorithm proceeds to step 308. In step 308 the algorithm applies an eighth-rate 
coding mode to the frame. The algorithm then proceeds to step 310. If, on the 
other hand, in step 306 the frame is not classified as background noise or silence, 
the algorithm proceeds to step 312. 

In step 312 the algorithm determines whether the current frame should be 
classified as unvoiced speech. This determination is made in accordance with 
various known methods of periodicity determination, such as, e.g., the use of zero 
crossings and normalized autocorrelation functions (NACFs). These techniques 
are described in the aforementioned U.S. Application Serial No. 08/815,354, 
previously fully incorporated herein by reference. If the frame is classified as 
unvoiced speech, the algorithm proceeds to step 314. In step 314 a quarter-rate 
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coding mode is applied to the frame. The algorithm then proceeds to step 310. If, 
on the other hand, in step 312 the frame is not classified as unvoiced speech, the 
algorithm proceeds to step 316, considering the frame to contain voiced speech. In. 
step 316 the algorithm goes to a half-rate prediction-based coding mode. The 

C ~ 1 4-U 4-U~~ — ~ J ~ 010 T« -l. 010 tU,> DCMD : ^ ~^-^>„,.i^ A ~ru~ 
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algorithm then proceeds to step 320. 

In step 320 the algorithm determines whether the computed PSNR is greater 
than a predefined threshold, or target, PSNR value. As an alternative, the 
threshold, or target, PSNR value may be a function of average bit rate. For 

10 example, the average bit rate is calculated periodically and fed back to the 
algorithm, which adjusts the target threshold value accordingly. Further, it should 
be understood that any conventional measure of performance may be substituted 
for PSNR. If the computed PSNR exceeds the target PSNR, the algorithm proceeds 
to step 322. In step 322 a half-rate coding mode is applied to the frame. The 

15 algorithm then proceeds to step 310. If, on the other hand, in step 320 the 
computed PSNR does not exceed the target PSNR, the algorithm proceeds to step 
324. In step 324 the algorithm applies a full-rate coding mode to the frame. The 
algorithm then proceeds to step 310. 

In step 310 the frame number counter is incremented by 1. The algorithm 

20 then proceeds to step 326. In step 326 the algorithm determines whether the frame 
number counter value is greater than or equal to the total number of frames that 
must be processed (i.e., whether there are any remaining frames to process). If the 
frame number counter value is less than the total number of frames to be 
processed, the algorithm returns to step 302, beginning the coding process for the 

25 next frame. If, on the other hand, the frame number counter value is greater than 
or equal to the total number of frames to be processed, the algorithm proceeds to 
step 328, ending the coding process. 

In alternate embodiments the full-rate coding mode described above with 
respect to FIG. 4 could be a higher-bit-rate predictive mechanism (i.e., any bit rate 

30 that is greater than half-rate). In one embodiment a higher-bit-rate, direct coding 
mechanism is substituted for the full-rate, predictive coding mode. The direct 
coding mode encodes the current speech frame or residue without using any 
information from the previous frame. 

The use of a direct encoding method is appropriate for speech segments for 

35 which there is no similarity between the current frame and the previous frame. An 
example is during the onset of a voice segment. Another example is unvoiced-to- 



WO 00/30075 PCT/US99/26850 

11 

voiced segment transitions. A direct encoding method is also useful in the middle 
of voiced segments when the cumulative effect of prediction-based encoding has 
degraded the past quantized frame so as to be too far out of sync with the 
corresponding original speech frame. In this case predictive coding will fail, even 
at much higher bit rates, due to the lack of similarity between the past quantized 
frame and the past original frame. In such a case, a fresh capture of the current 
frame with a direct encoding method will not only enhance the preservation of the 
current frame, but will also facilitate future prediction-based encoding of the next 
and later frames because the prediction mechanism will be aided by a more 
accurate memory. 

Those of skill would understand that while the embodiments described 
above contemplate four bit rates, any reasonable number of bit rates could be 
substituted for four. Those of skill would further appreciate that the embodiments 
described herein could be extended to analysis over a number of frames that is 
greater than one, at the expense of additional processing time or capability. 

In one embodiment two modes may be employed, with bit rates Rl and R2. 
The Rl coding method is a higher-rate, direct coding method. The R2 coding 
method is a lower-rate, predictive coding method. A closed-loop decision is 
performed such that the R2 coding method is tried first, the performance is 
checked by comparing with a performance measure, and the algorithm switches to 
the Rl coding method if the performance for the R2 coding mode is insufficient. In 
an alternate embodiment, the higher-rate, Rl coding mode is tried first, the 
performance is checked by comparing with a performance measure, and, if the 
performance is satisfactory, the lower-rate, R2 coding mode is tried. The 
performance check is then performed for the R2 coding mode, and if the R2 coding 
mode performance is unsatisfactory, the Rl coding mode is applied to the frame. 

In another embodiment multiple coding modes having bit rates 
R1,R2,...,RN-1,RN (where R1>R2>...>RN-1>RN) are employed. A closed-loop 
decision is performed such that the lowest rate, RN, is tried first. If the RN coding 
mode performs adequately, the RN coding mode is retained for the frame. 
Otherwise, the next, higher-rate coding mode, RN-1, is applied. The process is 
reiterated until either a coding mode performs adequately or the highest-rate 
mode, Rl, is retained. In an alternate embodiment, the highest rate, Rl, is tried 
first. If the Rl mode performs adequately, the next, lower-rate coding mode, R2, is 
tried. The process is continued until a given coding mode does not perform 
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adequately (at which time the last coding mode to perform adequately is applied), 
or until the lowest-rate coding mode, RN, performs satisfactorily and is applied. 

In another embodiment multiple coding modes having bit rates. 
Rl,R2,...,Rm-l,Rm,Rm+l,...,RN are employed. The bit rates have the following 

r/%U*;*»rt ^ - a-,-. ; ^ An . D1-vDO^ D 1s D«*^D-* . K DM A -I 11 j _ j _ _ : _ 
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works in conjunction with an open-loop mode decision. The open-loop mode 
decision, based upon parameters such as frame energy or frame periodicity, tells 
the coder to apply a mode with a bit rate of Rm, at which point the closed-loop 
mode decision takes over. The closed-loop mode decision applies the Rm coding 
mode, tests performance, and maintains the Rm coding mode if performance is 
satisfactory. Otherwise, the closed-loop mode decision tries the next, higher-rate 
coding mode, Rm-1. The process is reiterated until either a coding mode performs 
adequately or the highest-rate mode, Rl, is retained. Alternatively, the closed-loop 
mode decision applies the Rm coding mode, tests performance, and maintains the 
Rm coding mode if performance is satisfactory. Otherwise, the closed-loop mode 
decision tries the next, lower-rate coding mode, Rm+1. The process is reiterated 
until either a coding mode performs inadequately (at which time the last coding 
mode to perform adequately is applied), or the lowest-rate mode, RN, is retained. 

In another embodiment multiple coding modes having bit rates 
R1,R2,...,RN (where R1>R2>...>RN) are employed. All of the coding modes are 
applied in parallel to the input speech frame, and the performances of the coding 
modes are compared with a set of N threshold performance measures. The coding 
mode that appears to produce the most accurate result is selected. 

In another embodiment multiple coding modes having bit rates 
R1,R2,...,RN (where R1>R2>...>RN) are employed. All of the coding modes are 
applied in parallel to the input speech frame, and the performances of the coding 
modes are compared with a set of N threshold performance measures. If several 
coding modes exceed the performance threshold target, the coding mode having 
the lowest bit rate (and also performing above the performance threshold) is 
selected. 

In another embodiment multiple coding modes having bit rates 
R1,R2,..., Quarter Rate,...,Half Rate,...,RN (where Rl is Full Rate and RN is Eighth 
Rate) are employed. A closed-loop mode decision works in conjunction with an 
open-loop mode decision. The open-loop mode decision, based upon parameters 
such as frame energy or frame periodicity, tells the coder to apply the full-rate 
coding mode to unvoiced -to-voiced transition frames, voiced-to- voiced transition 
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frames, nonstationary voiced segments, and nonstationary unvoiced segments. 
Also based upon frame parameters, the open-loop mode decision tells the coder to 
apply the half-rate coding mode to steady-voiced segments that exhibit a 
significant degree of similarity from frame to frame. Also based upon frame 
5 parameters, the open-loop mode decision tells the coder to apply the quarter-rate 
coding mode to steady unvoiced segments. Also based upon frame parameters, 
the open-loop mode decision tells the coder to apply the eighth-rate coding mode 
to background noise and other nonspeech signals such as silence. Once the open- 
loop mode decision has selected a coding mode for application to the frame, the 

10 closed-loop mode decision takes over. The closed -loop mode decision applies the 
coding mode selected by the open-loop mode decision, tests performance, and 
maintains the selected coding mode if performance is satisfactory. Otherwise, the 
closed-loop mode decision tries the next, higher-rate coding mode. The process is 
reiterated until either a coding mode performs adequately or the full-rate mode is 

15 retained. Alternatively, the closed-loop mode decision applies the coding mode 
selected by the open-loop mode decision, tests performance, and maintains the 
selected coding mode if performance is satisfactory. Otherwise, the closed-loop 
mode decision tries the next, lower-rate coding mode. The process is reiterated 
until either a coding mode performs inadequately (at which time the last coding 

20 mode to perform adequately is applied), or the lowest-rate mode is retained. 

In another embodiment a multimode coder includes a first set of N modes, 
Mi, and the first set of modes has respective bit rates Ri, where i=l,2,...,N. The 
coder also has a second set of N modes, MCCi, and the second set of modes has 
respective bit rates RCCi, where i=l,2,...,N. The MCCi and Mi coding modes each 

25 use the same source-coding mode (i.e., the same encoder and decoder). However, 
the MCCi coding mode includes an additional layer of channel protection, in 
which (RCCi-Ri) bits are used for robust protection of the parameters of the Mi 
coding mode under the worst possible channel condition of the communication 
system. Hence, the performance, or voice quality, delivered by the Mi coding 

30 mode under channel-error-free conditions is similar to the performance, or voice 
quality, delivered by the MCCi coding mode under the worst possible channel 
error condition. The (RCCi-Ri) channel coding bits serve to provide adequate 
protection under the assumed, or target, worst channel condition. The assumed 
worst channel condition may advantageously be, e.g., a predefined percentage of 

35 frame error rate (FER). 
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In this particular embodiment, a closed-loop mode decision advantageously 
accounts for both channel variation and source variation to deliver a guaranteed 
quality of service. For example, a source-controlled, closed-loop mode decision 
such as described above is applied first. The closed-loop mode decision tells the 
5 coder to use the Mi coding mode. An external, network-control indicator SW, 
which is a signal provided by the communication network to the speech encoder, 
indicates whether the communication channel is in good condition (e.g., if SW=1, 
the channel is error-free) in bad condition (e.g., if SW=0, the channel is erroneous). 
If the channel is in good condition, the coding mode Mi, having bit rate Ri, is used. 

10 If, on the other hand, the channel is in bad condition, the coding mode MCCi, 
having bit rate RCCi, is used. 

Those skilled in the art would appreciate that the number of network 
conditions need not be restricted to two. Thus, in one embodiment, a multimode 
coder is designed to account for j=l,2,..,M different possible network conditions by 

15 providing M different modes MCCi,j having rates RCCi,j, where j=l,2,..,M, for each 
original source-controlled coding mode Mi. Such a scheme allows for varied 
amounts of channel coding because (RCCi,j-RCCi) represents the minimum 
number of bits needed to add channel error protection to the channel coding layer 
so that the channel error protection will be adequate for the worst-case scenario in 

20 the j-th channel error condition. The source-controlled, closed-loop mode decision 
then determines which coding mode Mi to apply first, and, based on the value of 
SW=j (where j=l,2,..,M), selects the coding mode MCCi,j. Such a closed-loop, 
combined-network-and-source-controlled codec delivers guaranteed quality of 
service across various channel conditions while also delivering a low average bit 

25 rate. 

Preferred embodiments of the present invention have thus been shown and 
described. It would be apparent to one of ordinary skill in the art, however, that 
numerous alterations may be made to the embodiments herein disclosed without 
departing from the spirit or scope of the invention. Therefore, the present 
30 invention is not to be limited except in accordance with the following claims. 
What is claimed is: 
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1 . A speech coder, comprising: 

a codec configured to operate in at least one of a plurality of coding 

modes; and 

a closed-loop mode decision module coupled to the codec and 
configured to apply a first coding mode from the plurality of coding modes to an 
input speech frame, the first coding mode having a first bit rate that is lower than 
the bit rate of any other coding mode of the plurality of coding modes, the closed- 
loop mode decision module being further configured to obtain a performance 
measure of the codec, compare the performance measure with a threshold value, 
and, if the performance measure does not exceed the threshold value, reject the 
first coding mode in favor of a second coding mode having a second bit rate that is 
greater than the first bit rate. 

2. The speech coder of claim 1, wherein the closed-loop mode decision 
module is configured to continue a process of selecting and, based on performance, 
rejecting coding modes chosen successively in order of increasing bit rate. 

3. The speech coder of claim 1, wherein the performance based measure 
is obtained by comparing a resultant synthetic speech frame with the input speech 
frame. 

4. The speech coder of claim 1, wherein the first coding mode is a 
prediction-based coding mode and the second coding mode is a direct coding 
mode. 

5. The speech coder of claim 1, further comprising an open-loop mode 
decision module coupled to the codec and configured to select one of the plurality 
of coding modes for application to the input speech frame before the closed-loop 
mode decision module applies a coding mode, wherein the closed-loop mode 
decision module is configured to first apply the coding mode selected by the open- 
loop mode decision module. 
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6. The speech coder of claim 2, further comprising an open-loop mode 
2 decision module coupled to the codec and configured to select one of the plurality 

of coding modes for application to the input speech frame before the closed-loop. 
4 mode decision module applies a coding mode, wherein the closed-loop mode 

decision module is configured to first apply the coding mode selected by the open- 
6 loop mode decision module. 

7. The speech coder of claim 1, wherein the threshold value is a 
2 predefined quantity. 

8. The speech coder of claim 1, wherein the threshold value is a function 
2 of average bit rate. 

9. A method of coding speech frames, comprising the steps of: 

2 selecting a first coding mode to apply to a speech frame, the first 

coding mode having a first bit rate; 
4 obtaining a coding performance measure; 

comparing the coding performance measure with a threshold value; 

6 and 

rejecting the first coding mode in favor of a second coding mode if 
8 the coding performance measure does not exceed the threshold value, the second 
coding mode having a second bit rate that exceeds the first bit rate. 

10. The method of claim 9, further comprising the step of repeating the 
2 obtaining, comparing, and rejecting steps in successive order until the coding 

performance measure exceeds the threshold value. 

11. The method of claim 9, wherein the obtaining step comprises 
2 comparing a resultant synthetic speech frame with the speech frame. 

12. The method of claim 9, wherein the first coding mode is a prediction- 
2 based coding mode and the second coding mode is a direct coding mode. 



2 



13. The method of claim 9, wherein the selecting step comprises selecting 
a first coding mode based upon parameters of the speech frame. 
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14. The method of claim 10, wherein the selecting step comprises 
2 selecting a first coding mode based upon parameters of the speech frame. 

15. The method of claim 9, wherein the comparing step comprises 
2 comparing the coding performance measure with a predefined threshold value. 

16. The method of claim 9, wherein the comparing step comprises 
2 comparing the coding performance measure with a threshold value that is a 

function of average bit rate; 

17. A speech coder, comprising: 

2 means for selecting a first coding mode to apply to a speech frame, 

the first coding mode having a first bit rate; 
4 means for obtaining a coding performance measure; 

means for comparing the coding performance measure with a 
6 threshold value; and 

means for rejecting the first coding mode in favor of a second coding 
8 mode if the coding performance measure does not exceed the threshold value, the 
second coding mode having a second bit rate that exceeds the first bit rate. 

18. The speech coder of claim 17, further comprising means for 
2 continuing to obtain the performance measure, compare the performance measure 

with the threshold value, and reject coding modes in favor of other coding modes 
4 having greater bit rates until the coding performance measure exceeds the 
threshold value. 

19. The speech coder of claim 17, wherein the means for obtaining 
2 comprises means for comparing a resultant synthetic speech frame with the speech 

frame. 

20. The speech coder of claim 17, wherein the first coding mode is a 
2 prediction-based coding mode and the second coding mode is a direct coding 

mode. 
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21. The speech coder of claim 17, wherein the means for selecting 
2 comprises means for selecting a first coding mode based upon parameters of the 
speech frame. 

22 The speech coder of claim IS, wherein the mean.s for selecting 
2 comprises means for selecting a first coding mode based upon parameters of the 
speech frame. 

23. The speech coder of claim 17, wherein the threshold value is a 
2 predefined quantity. 



24. The speech coder of claim 17, wherein the threshold value is a 
2 function of average bit rate. 
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